COVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS

نویسندگان: ثبت نشده
چکیده مقاله:

Multivariate reward processes with reward functions of constant rates, defined on a semi-Markov process, first were studied by Masuda and Sumita, 1991. Reward processes with nonlinear reward functions were introduced in Soltani, 1996. In this work we study a multivariate process , , where are reward processes with nonlinear reward functions respectively. The Laplace transform of the covariance matrix, ?(t), is specified for given , and if they are real analytic functions, then the covariance matrix is fully specified. This result in particular provides an explicit formula for the variances of univariate reward processes. We also view ?(t) as a solution of a renewal equation.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A correction term for the covariance of renewal-reward processes with multivariate rewards

We consider a renewal-reward process with multivariate rewards. Such a process is constructed from an i.i.d. sequence of time periods, to each of which there is associated a multivariate reward vector. The rewards in each time period may depend on each other and on the period length, but not on the other time periods. Rewards are accumulated to form a vector valued process that exhibits jumps i...

متن کامل

Asymptotics for renewal-reward processes with retrospective reward structure

Let {(Xi; Yi): i= : : : ;−1; 0; 1; : : :} be a doubly in nite renewal-reward process, where {Xi: i= : : :− 1; 0; 1; : : :} is an i.i.d. sequence of renewal cycle lengths and Yi= g(Xi−q; Xi−q+1; : : : ; Xi) is the lump reward earned at the end of the ith renewal cycle, with some function g :R q+1 → R . Starting with the rst renewal cycle (of duration X1) at the time origin, let C(t) denote the e...

متن کامل

Markov Decision Processes with Arbitrary Reward Processes

We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well—in hindsight—as every stationary policy. This generalizes the classical no-regret result for repeated games. Specif...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 14  شماره 2

صفحات  -

تاریخ انتشار 2003-06-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023